Project Gutenberg

Project Gutenberg
Logo
Established 1 December 1971
(First document posted)[1]
Collection
Size Over 34,000 documents
Other information
Director Michael S. Hart
Website www.gutenberg.org

Project Gutenberg, abbreviated as PG, is a volunteer effort to digitize and archive cultural works, to "encourage the creation and distribution of eBooks."[2] Founded in 1971 by Michael S. Hart, it is the oldest digital library.[3] Most of the items in its collection are the full texts of public domain books. The project tries to make these as free as possible, in long-lasting, open formats that can be used on almost any computer. As of December 2009, Project Gutenberg claimed over 34,000 items in its collection. Project Gutenberg is affiliated with many projects that are independent organizations which share the same ideals, and have been given permission to use the Project Gutenberg trademark.

Wherever possible, the releases are available in plain text, but other formats are included, such as HTML, PDF, EPUB, MOBI, and Plucker. Most releases are in the English language, but many non-English works are also available. There are multiple affiliated projects that are providing additional content, including regional and language-specific works. Project Gutenberg is also closely affiliated with Distributed Proofreaders, an internet-based community for proofreading scanned texts.

Contents

History

Michael Hart (left) and Gregory Newby (right) of Project Gutenberg, 2006

Project Gutenberg was started by Michael Hart in 1971 with the digitization of the United States Declaration of Independence.[4] Hart, a student at the University of Illinois, obtained access to a Xerox Sigma V mainframe computer in the university's Materials Research Lab. Through friendly operators, he received an account with a virtually unlimited amount of computer time; its value at that time has since been variously estimated at $100,000 or $100,000,000.[4] Hart has said he wanted to "give back" this gift by doing something that could be considered to be of great value. His initial goal was to make the 10,000 most consulted books available to the public at little or no charge, and to do so by the end of the 20th century.[5]

This particular computer was one of the 15 nodes on the computer network that would become the Internet. Hart believed that computers would one day be accessible to the general public and decided to make works of literature available in electronic form for free. He used a copy of the United States Declaration of Independence in his backpack, and this became the first Project Gutenberg e-text. He named the project after Johannes Gutenberg, the fifteenth century German printer who propelled the movable type printing press revolution.

By the mid-1990s, Hart was running Project Gutenberg from Illinois Benedictine College. More volunteers had joined the effort. All of the text was entered manually up until 1989 when image scanners and optical character recognition software improved and became more widely available, which made book scanning more feasible.[6] Hart later came to an arrangement with Carnegie Mellon University, which agreed to administer Project Gutenberg's finances. As the volume of e-texts increased, volunteers began to take over the project's day-to-day operations that Hart had run.

Pietro Di Miceli, an Italian volunteer, developed and administered the first Project Gutenberg website and started the development of the Project online Catalog. In his ten years in this role (1994–2004), the Project web pages won a number of awards, often being featured in "best of the Web" listings, and contributing to the project's popularity.[7]

Recent developments

In 2000, a non-profit corporation, the Project Gutenberg Literary Archive Foundation, Inc. was chartered in Mississippi to handle the project's legal needs. Donations to it are tax-deductible. Long-time Project Gutenberg volunteer Gregory Newby became the foundation's first CEO.[8]

Charles Franks also founded Distributed Proofreaders (DP) in 2000, which allowed the proofreading of scanned texts to be distributed among many volunteers over the Internet. This effort greatly increased the number and variety of texts being added to Project Gutenberg, as well as making it easier for new volunteers to start contributing. DP became officially affiliated with Project Gutenberg in 2002.[9] As of 2007, the 10,000+ DP-contributed books comprised almost a third of the nearly 34,000 books in Project Gutenberg.

Starting in 2004, an improved online catalog made Project Gutenberg content easier to browse, access and hyperlink. Project Gutenberg is now hosted by ibiblio at the University of North Carolina at Chapel Hill.

Scope of collection

Growth of Project Gutenberg publications from 1994 until 2008.

As of December 2009, Project Gutenberg claimed over 34,000 items in its collection, with an average of over fifty new e-books being added each week.[10] These are primarily works of literature from the Western cultural tradition. In addition to literature such as novels, poetry, short stories and drama, Project Gutenberg also has cookbooks, reference works and issues of periodicals.[11] The Project Gutenberg collection also has a few non-text items such as audio files and music notation files.

Most releases are in English, but there are also significant numbers in many other languages. As of July 2008, the non-English languages most represented are: French, German, Finnish, Dutch, Chinese, and Portuguese.[3]

Whenever possible, Gutenberg releases are available in plain text, mainly using US-ASCII character encoding but frequently extended to ISO-8859-1 (needed to represent accented characters in French and Scharfes s in German, for example). Besides being copyright-free, the requirement for a Latin (character set) text version of the release has been a criterion of Michael Hart's since the founding of Project Gutenberg, as he believes this is the format most likely to be readable in the extended future. Out of necessity, this criterion has had to be extended further for the sizeable collection of texts in East Asian languages such as Chinese and Japanese now in the collection, where UTF-8 is used instead. The text is wrapped at 65-70 characters and paragraphs are separated by a double-line break. Although this makes the release available to anybody with a text-reader, a drawback of this format is the lack of markup and the resulting relatively bland appearance.[12]

Other formats may be released as well when submitted by volunteers. The most common non-ASCII format is HTML, which allows markup and illustrations to be included. Some project members and users have requested more advanced formats, believing them to be much easier to read. But some formats that are not easily editable, such as PDF, are generally not considered to fit in with the goals of Project Gutenberg, although many are being introduced to the collection in PDF format so that illustrations can be added to downloadable documents. For years, there has been discussion of using some type of XML, although progress on that has been slow.

Ideals

iLiad e-book reader equipped with e-paper display

Michael Hart said in 2004, "The mission of Project Gutenberg is simple: 'To encourage the creation and distribution of ebooks.'"[2] His goal is, "to provide as many e-books in as many formats as possible for the entire world to read in as many languages as possible."[3] Likewise, a project slogan is to "break down the bars of ignorance and illiteracy",[13] because its volunteers aim to continue spreading public literacy and appreciation for the literary heritage just as public libraries began to do in the late 19th century.[14][15]

Project Gutenberg is intentionally decentralized. For example, there is no selection policy dictating what texts to add. Instead, individual volunteers work on what they are interested in, or have available. The Project Gutenberg collection is intended to preserve items for the long term, so they cannot be lost by any one localized accident. In an effort to ensure this, the entire collection is backed-up regularly and mirrored on servers in many different locations.

Copyright

Project Gutenberg is careful to verify the status of its ebooks according to U.S. copyright law. Material is added to the Project Gutenberg archive only after it has received a copyright clearance, and records of these clearances are saved for future reference. Unlike some other digital library projects, Project Gutenberg does not claim new copyright on titles it publishes. Instead, it encourages their free reproduction and distribution.[3]

Most books in the Project Gutenberg collection are distributed as public domain under U.S. copyright law. The licensing included with each ebook puts some restrictions on what can be done with the texts (such as distributing them in modified form, or for commercial purposes) as long as the Project Gutenberg trademark is used. If the header is stripped and the trademark not used, then the public domain texts can be reused without any restrictions.

There are also a few copyrighted texts that Project Gutenberg distributes with permission. These are subject to further restrictions as specified by the copyright holder.

External Criticism

Some people have criticized Project Gutenberg for lack of scholarly rigor in its e-texts: for example, there is usually inadequate information about the edition used and often omission of original prefaces. However, John Mark Ockerbloom of the University of Pennsylvania noted that Project Gutenberg is responsive about addressing errors once they are identified, and the texts now include specific source edition citations.[16] In many cases the editions also are not the most current scholarly editions, for these later editions are not usually in the public domain.

While the works in Project Gutenberg represent a valuable sample of publications that span several centuries, there are some issues of concern for linguistic analysis. Some content may have been modified by the transcriber because of editorial changes or corrections (such as to correct for obvious proofsetting or printing errors). The spelling may also have been modified to conform with current practices (although the intent by Project Gutenberg,[17] and by Distributed Proofreaders,[1] is to preserve the original text and where possible the formatting). This can mean that the works may be problematic when searching for older grammatical usage. Finally, the collected works can be weighted heavily towards certain authors (such as Charles Dickens), while others are barely represented.[18]

How Project Gutenberg recognizes volunteers' efforts in making classic literary works available to the public has also engendered criticism. Those who do the time-consuming work of producing and donating the initial etext files are typically credited within the introduction. But they may feel that others who later process their donated files are being unfairly credited as "co-producers." Transforming the simple text files that have been the Project Gutenberg staple into HTML format, for instance, typically requires only a fraction of the effort.

In March 2004, a new initiative was begun by Michael Hart and John S. Guagliardo[19] to provide low-cost intellectual properties. The initial name for this project was Project Gutenberg 2 (PG II), which created controversy among PG volunteers because of the re-use of the project's trademarked name for a commercial venture.[8]

Affiliated projects

All affiliated projects are independent organizations which share the same ideals, and have been given permission to use the Project Gutenberg trademark. They often have a particular national, or linguistic focus.[20]

List of affiliated projects

See also

References

  1. 1.0 1.1 Hart, Michael S.. "United States Declaration of Independence by United States". Project Gutenberg. http://www.gutenberg.org/etext/1. Retrieved 17 February 2007. 
  2. 2.0 2.1 Hart, Michael S. (23 October 2004). "Gutenberg Mission Statement by Michael Hart". Project Gutenberg. http://www.gutenberg.org/wiki/Gutenberg:Project_Gutenberg_Mission_Statement_by_Michael_Hart. Retrieved 15 August 2007. 
  3. 3.0 3.1 3.2 3.3 Thomas, Jeffrey (20 June 2007). "Project Gutenberg Digital Library Seeks To Spur Literacy". U.S. Department of State, Bureau of International Information Programs. http://usinfo.state.gov/xarchives/display.html?p=washfile-english&y=2007&m=July&x=200707201511311CJsamohT0.6146356. Retrieved 20 August 2007. 
  4. 4.0 4.1 "Hobbes' Internet Timeline". http://www.zakon.org/robert/internet/timeline/. Retrieved 17 February 2009. 
  5. Day, B. H.; Wortman, W. A. (2000). Literature in English: A Guide for Librarians in the Digital Age. Chicago: Association of College and Research Libraries. pp. 170. ISBN 0838980813. 
  6. Vara, Vauhini (5 December 2005). "Project Gutenberg Fears No Google". Wall Street Journal. http://online.wsj.com/public/article/SB113415403113218620-U_OqLOmApoaSvNpy5SjNwvhpW5w_20061209.html. Retrieved 15 August 2007. 
  7. "Gutenberg:Credits". Project Gutenberg. 8 June 2006. http://www.gutenberg.org/wiki/Gutenberg:Credits. Retrieved 15 August 2007. 
  8. 8.0 8.1 Hane, Paula (2004). "Project Gutenberg Progresses". Information Today 21 (5). http://www.infotoday.com/it/may04/hane1.shtml. Retrieved 20 August 2007. 
  9. Staff (August 2007). "The Distributed Proofreaders Foundation". Distributed proofreaders. http://www.pgdp.net/c/faq/dpf.php. Retrieved 10 August 2007. 
  10. According to gutindex-2006, there were 1,653 new Project Gutenberg items posted in the first 33 weeks of 2006. This averages out to 50.09 per week. This does not include additions to affiliated projects.
  11. For a listing of the categorized books, see: Staff (28 April 2007). "Category:Bookshelf". Project Gutenberg. http://www.gutenberg.org/wiki/Category:Bookshelf. Retrieved 18 August 2007. 
  12. Boumphrey, Frank (July 2000). "European Literature and Project Gutenberg". Cultivate Interactive. http://www.cultivate-int.org/issue1/gutenberg/. Retrieved 15 August 2007. 
  13. "The Project Gutenberg Weekly Newsletter". Project Gutenberg. 10 December 2003. http://www.gutenbergnews.org/nl_archives/2003/pgweekly_2003_12_10_part_2.txt. Retrieved 8 June 2008. 
  14. Perry, Ruth (2007). "Postscript about the Public Libraries". Modern Language Association. http://www.mla.org/resources/documents/rep_primaryrecords/repview_records/primary_records10. Retrieved 20 August 2007. 
  15. Lorenzen, Michael (2002). "Deconstructing the Philanthropic Library: The Sociological Reasons Behind Andrew Carnegie's Millions to Libraries". Modern Language Association. http://www.michaellorenzen.com/carnegie.html. Retrieved 20 August 2007. 
  16. Martha L. Brogan, Daphnée Rentfrow (2005). A Kaleidoscope of Digital American Literature. New York: Digital Library Federation. ISBN 1933645288. OCLC 61247191. 
  17. "Gutenberg Volunteers FAQ — See V.54". [1]. 18 January 2009. http://www.gutenberg.org/wiki/Gutenberg:Volunteers%27_FAQ. 
  18. Hoffmann, Sebastian (2005). Grammaticalization And English Complex Prepositions: A Corpus-based Study (1st ed.). Routledge. ISBN 0415360498. OCLC 156424479. 
  19. Executive director of the World eBook Library.
  20. Staff (17 July 2007). "Gutenberg:Partners, Affiliates and Resources". Project Gutenberg. http://www.gutenberg.org/wiki/Gutenberg:Partners,_Affiliates_and_Resources. Retrieved 20 August 2007. 
  21. Staff (24 January 2007). "Project Gutenberg of Australia". http://gutenberg.net.au/. Retrieved 10 August 2006. 
  22. Staff (1994). "Projekt Gutenberg-DE". Spiegel Online. http://gutenberg.spiegel.de/. Retrieved 20 August 2007. 
  23. Staff (2004). "Project Gutenberg Consortia Center". http://www.gutenberg.us/. Retrieved 20 August 2007. 
  24. Staff. "PG-EU". http://www.gutenberg.nl/. Retrieved 20 August 2007. 
  25. Staff. "Project Gutenberg of the Philippines". http://www.gutenberg.ph/. Retrieved 20 August 2007. 
  26. Staff. "Project Gutenberg of Taiwan". http://www.gutenberg.tw/. Retrieved 5 April 2009. 
  27. Staff (2005). "Project Gutenberg Europe". EUnet Yugoslavia. http://pge.rastko.net/. Retrieved 20 August 2007. 
  28. Kirps, Jos (22 May 2007). "Project Gutenberg Luxembourg". http://www.gutenberg.lu/. Retrieved 20 August 2007. 
  29. Riikonen, Tapio (28 February 2005). "Projekti Lönnrot". http://www.lonnrot.net/. Retrieved 20 August 2007. 
  30. "Project Gutenberg Canada". http://www.gutenberg.ca/. Retrieved 20 August 2007. 

External links